28 research outputs found
Factors Influencing the Surprising Instability of Word Embeddings
Despite the recent popularity of word embedding methods, there is only a
small body of work exploring the limitations of these representations. In this
paper, we consider one aspect of embedding spaces, namely their stability. We
show that even relatively high frequency words (100-200 occurrences) are often
unstable. We provide empirical evidence for how various factors contribute to
the stability of word embeddings, and we analyze the effects of stability on
downstream tasks.Comment: NAACL HLT 201
SLATE: A Super-Lightweight Annotation Tool for Experts
Many annotation tools have been developed, covering a wide variety of tasks
and providing features like user management, pre-processing, and automatic
labeling. However, all of these tools use Graphical User Interfaces, and often
require substantial effort to install and configure. This paper presents a new
annotation tool that is designed to fill the niche of a lightweight interface
for users with a terminal-based workflow. Slate supports annotation at
different scales (spans of characters, tokens, and lines, or a document) and of
different types (free text, labels, and links), with easily customisable
keybindings, and unicode support. In a user study comparing with other tools it
was consistently the easiest to install and use. Slate fills a need not met by
existing systems, and has already been used to annotate two corpora, one of
which involved over 250 hours of annotation effort.Comment: To appear at ACL as a dem
Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
Linguistically diverse datasets are critical for training and evaluating
robust machine learning systems, but data collection is a costly process that
often requires experts. Crowdsourcing the process of paraphrase generation is
an effective means of expanding natural language datasets, but there has been
limited analysis of the trade-offs that arise when designing tasks. In this
paper, we present the first systematic study of the key factors in
crowdsourcing paraphrase collection. We consider variations in instructions,
incentives, data domains, and workflows. We manually analyzed paraphrases for
correctness, grammaticality, and linguistic diversity. Our observations provide
new insight into the trade-offs between accuracy and diversity in crowd
responses that arise as a result of task design, providing guidance for future
paraphrase generation procedures.Comment: Published at ACL 201
Using Paraphrases to Study Properties of Contextual Embeddings
We use paraphrases as a unique source of data to analyze contextualized
embeddings, with a particular focus on BERT. Because paraphrases naturally
encode consistent word and phrase semantics, they provide a unique lens for
investigating properties of embeddings. Using the Paraphrase Database's
alignments, we study words within paraphrases as well as phrase
representations. We find that contextual embeddings effectively handle
polysemous words, but give synonyms surprisingly different representations in
many cases. We confirm previous findings that BERT is sensitive to word order,
but find slightly different patterns than prior work in terms of the level of
contextualization across BERT's layers.Comment: Published at NAACL 202